Add Apple-Music-Scraper Python Script#3265
Add Apple-Music-Scraper Python Script#3265Abssdghi wants to merge 29 commits intoavinashkranjan:mainfrom
Conversation
Reviewer's GuideAdds a new Apple Music web scraping module that parses Apple Music’s serialized-server-data and JSON-LD blocks to provide structured song, album, playlist, artist, video, room, and search results, supported by shared utilities for artwork URL formatting, URL conversion, and fetching singles/EPs, along with documentation and dependencies. Sequence diagram for the Apple Music search-to-latest-song workflowsequenceDiagram
actor "User" as User
participant "Client Script" as Client
participant "main.search()" as Search
participant "main.artist_scrape()" as ArtistScrape
participant "main.album_scrape()" as AlbumScrape
participant "utils.get_cover()" as GetCover
participant "utils.get_all_singles()" as GetAllSingles
participant "Apple Music Web" as AppleWeb
"User" ->> "Client Script": "Call search('night tapes')"
"Client Script" ->> "main.search()": "search(keyword)"
"main.search()" ->> "Apple Music Web": "GET https://music.apple.com/us/search?term=keyword"
"Apple Music Web" -->> "main.search()": "HTML with 'serialized-server-data' script"
"main.search()" ->> "main.search()": "Parse HTML with BeautifulSoup"
"main.search()" ->> "main.search()": "json.loads(serialized-server-data)"
"main.search()" ->> "utils.get_cover()": "Build artwork URL for each result"
"utils.get_cover()" -->> "main.search()": "Formatted artwork URL"
"main.search()" -->> "Client Script": "Structured search results dict"
"Client Script" ->> "main.artist_scrape()": "artist_scrape(artist_url)"
"main.artist_scrape()" ->> "Apple Music Web": "GET artist page HTML"
"Apple Music Web" -->> "main.artist_scrape()": "HTML with 'serialized-server-data'"
"main.artist_scrape()" ->> "main.artist_scrape()": "Parse and extract sections (detail, latest, top, etc.)"
"main.artist_scrape()" ->> "utils.get_cover()": "Build artist artwork URL"
"utils.get_cover()" -->> "main.artist_scrape()": "Formatted artwork URL"
"main.artist_scrape()" ->> "utils.get_all_singles()": "get_all_singles(artist_url)"
"utils.get_all_singles()" ->> "Apple Music Web": "GET artist/see-all?section=singles"
"Apple Music Web" -->> "utils.get_all_singles()": "HTML with singles section"
"utils.get_all_singles()" ->> "utils.get_all_singles()": "Parse serialized-server-data and items"
"utils.get_all_singles()" -->> "main.artist_scrape()": "List of singles and EP URLs"
"main.artist_scrape()" -->> "Client Script": "Artist metadata dict (including 'latest' URL)"
"Client Script" ->> "main.album_scrape()": "album_scrape(latest_song_album_url)"
"main.album_scrape()" ->> "Apple Music Web": "GET album page HTML"
"Apple Music Web" -->> "main.album_scrape()": "HTML with 'serialized-server-data'"
"main.album_scrape()" ->> "main.album_scrape()": "Parse sections (album-detail, track-list, etc.)"
"main.album_scrape()" ->> "utils.get_cover()": "Build album artwork URL"
"utils.get_cover()" -->> "main.album_scrape()": "Formatted artwork URL"
"main.album_scrape()" -->> "Client Script": "Album metadata dict (title, image, songs, more, similar)"
"Client Script" -->> "User": "Display latest song title and cover art"
Class diagram for the new Apple Music scraper and utilitiesclassDiagram
class MainScraper {
+room_scrape(link="https://music.apple.com/us/room/6748797380") list~str~
+playlist_scrape(link="https://music.apple.com/us/playlist/new-music-daily/pl.2b0e6e332fdf4b7a91164da3162127b5") list~str~
+search(keyword="sasha sloan") dict
+song_scrape(url="https://music.apple.com/us/song/california/1821538031") dict
+album_scrape(url="https://music.apple.com/us/album/1965/1817707266?i=1817707585") dict
+video_scrape(url="https://music.apple.com/us/music-video/gucci-mane-visualizer/1810547026") dict
+artist_scrape(url="https://music.apple.com/us/artist/king-princess/1349968534") dict
}
class Utils {
+get_cover(url, width, height, format="jpg", crop_option="") str
+convert_album_to_song_url(album_url) str
+get_all_singles(url="https://music.apple.com/us/artist/king-princess/1349968534") list~str~
}
MainScraper ..> Utils : "uses 'get_cover' for artwork URLs"
MainScraper ..> Utils : "uses 'convert_album_to_song_url' in room_scrape, playlist_scrape, album_scrape"
MainScraper ..> Utils : "uses 'get_all_singles' inside artist_scrape"
Flow diagram for generic Apple Music page scraping using serialized-server-dataflowchart TD
A["Start scraping function (song_scrape, album_scrape, video_scrape, artist_scrape, room_scrape, playlist_scrape, search"] --> B["Build target Apple Music URL (page-specific)"]
B["Build target Apple Music URL (page-specific)"] --> C["Set headers with 'User-Agent: Mozilla/5.0'"]
C["Set headers with 'User-Agent: Mozilla/5.0'"] --> D["requests.get(URL, headers=headers)"]
D["requests.get(URL, headers=headers)"] --> E["Parse HTML with BeautifulSoup"]
E["Parse HTML with BeautifulSoup"] --> F{"Find script tag with id 'serialized-server-data'?"}
F{"Find script tag with id 'serialized-server-data'?"} -->|"Yes"| G["Extract script text and load JSON via json.loads"]
F{"Find script tag with id 'serialized-server-data'?"} -->|"No"| Z["Return empty or partial result (error or structure change)"]
G["Extract script text and load JSON via json.loads"] --> H["Access our_json[0]['data']['sections']"]
H["Access our_json[0]['data']['sections']"] --> I{"Select relevant sections by 'id' pattern (e.g., 'track-list', 'artist-detail', 'music-video-header')"}
I{"Select relevant sections by 'id' pattern (e.g., 'track-list', 'artist-detail', 'music-video-header')"} --> J["Iterate over 'items' collections to gather URLs, titles, subtitles, descriptors"]
J["Iterate over 'items' collections to gather URLs, titles, subtitles, descriptors"] --> K{"Artwork present in item?"}
K{"Artwork present in item?"} -->|"Yes"| L["Call utils.get_cover() to expand artwork URL with width, height, format, crop"]
K{"Artwork present in item?"} -->|"No"| M["Set artwork field to empty string"]
L["Call utils.get_cover() to expand artwork URL with width, height, format, crop"] --> N["Attach formatted artwork URL to result object"]
M["Set artwork field to empty string"] --> N["Attach formatted artwork URL to result object"]
N["Attach formatted artwork URL to result object"] --> O{"Needs additional JSON-LD (preview or video URL)?"}
O{"Needs additional JSON-LD (preview or video URL)?"} -->|"Yes"| P["Find JSON-LD script (e.g., id 'schema:song' or 'schema:music-video') and json.loads"]
O{"Needs additional JSON-LD (preview or video URL)?"} -->|"No"| R["Skip JSON-LD step"]
P["Find JSON-LD script (e.g., id 'schema:song' or 'schema:music-video') and json.loads"] --> Q["Extract preview or video content URL and add to result"]
Q["Extract preview or video content URL and add to result"] --> S["Assemble final structured dict or list (songs, albums, artists, videos, rooms, playlists)"]
R["Skip JSON-LD step"] --> S["Assemble final structured dict or list (songs, albums, artists, videos, rooms, playlists)"]
S["Assemble final structured dict or list (songs, albums, artists, videos, rooms, playlists)"] --> T["Return JSON-like Python structure to caller"]
Z["Return empty or partial result (error or structure change)"] --> T["Return JSON-like Python structure to caller"]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey there - I've reviewed your changes - here's some feedback:
- There are many
try/except: passblocks throughout the scraper; it would be more robust to catch specific exceptions (e.g.,KeyError,IndexError,JSONDecodeError) and optionally log or default values so that real failures aren’t silently swallowed. - HTTP requests currently don’t specify timeouts or handle network-level errors; consider adding a shared request helper with a reasonable timeout and basic error handling/retries to avoid hanging or crashing on transient network issues.
- In several places when extracting artwork (e.g., in
search()for artists/albums/songs/playlists/videos), you accessi[0]['artwork']instead ofi['artwork'], which is likely a typo and causes exceptions that are then swallowed—clean this up so artwork URLs are reliably parsed.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- There are many `try/except: pass` blocks throughout the scraper; it would be more robust to catch specific exceptions (e.g., `KeyError`, `IndexError`, `JSONDecodeError`) and optionally log or default values so that real failures aren’t silently swallowed.
- HTTP requests currently don’t specify timeouts or handle network-level errors; consider adding a shared request helper with a reasonable timeout and basic error handling/retries to avoid hanging or crashing on transient network issues.
- In several places when extracting artwork (e.g., in `search()` for artists/albums/songs/playlists/videos), you access `i[0]['artwork']` instead of `i['artwork']`, which is likely a typo and causes exceptions that are then swallowed—clean this up so artwork URLs are reliably parsed.
## Individual Comments
### Comment 1
<location> `Apple-Music-Scraper/main.py:123` </location>
<code_context>
+ 'playlists':[],
+ 'videos':[]
+ }
+ link = "https://music.apple.com/us/search?term="+keyword
+
+ headers = {
</code_context>
<issue_to_address>
**issue:** The search keyword should be URL-encoded before being concatenated into the query string.
Directly concatenating the raw keyword will break searches for values with spaces, &, +, or non-ASCII characters. Use proper URL encoding, e.g. `urllib.parse.quote_plus(keyword)` and `f"https://music.apple.com/us/search?term={quote_plus(keyword)}"`, so the search works for arbitrary input.
</issue_to_address>
### Comment 2
<location> `Apple-Music-Scraper/main.py:155-156` </location>
<code_context>
+ try:
+ image_url = i['artwork']['dictionary']['url']
+ image_width = i['artwork']['dictionary']['width']
+ image_height = i[0]['artwork']['dictionary']['height']
+ artwork = get_cover(image_url, image_width, image_height)
+ except:
+ artwork = ""
</code_context>
<issue_to_address>
**issue (bug_risk):** Artwork height is indexed via `i[0]` instead of `i`, which is likely a bug and will raise at runtime.
Since `i` is the item dict from `for i in artists['items']:`, `i[0]` will fail (TypeError/KeyError) and be swallowed by the bare `except`, causing artwork to be dropped even when present. The same issue appears in the albums, songs, playlists, and videos sections. Accessing `i['artwork']['dictionary']['height']` (like width) avoids this failure and preserves artwork where available.
</issue_to_address>
### Comment 3
<location> `Apple-Music-Scraper/main.py:395-403` </location>
<code_context>
+ for i in sections:
+ if "album-detail" in i['id']:
+ album_detail_index = index
+ elif "track-list " in i['id']:
+ track_list_index = index
+ elif "video" in i['id']:
+ video_index = index
+ elif "more" in i['id']:
+ more_index = index
+ elif "you-might-also-like" in i['id']:
+ similar_index = index
+ elif "track-list-section" in i['id']:
+ track_list_section_index = index
+ index+=1
</code_context>
<issue_to_address>
**issue (bug_risk):** The `"track-list "` check includes a trailing space, which likely prevents matching the intended section.
Because of that trailing space, `"track-list " in i['id']` will likely never match, so `track_list_index` may never be set and the later `sections[track_list_index]` access will always fall into the `except` path. Consider matching `"track-list"` instead, and preferably use a stricter check like equality or `startswith` rather than a substring search to make this more robust.
</issue_to_address>
### Comment 4
<location> `Apple-Music-Scraper/main.py:147-156` </location>
<code_context>
+ elif "music_video" in i['id']:
+ videos = i
+
+ try:
+ artists_result = []
+
+ for i in artists['items']:
+ artist = i['title']
+ try:
+ image_url = i['artwork']['dictionary']['url']
+ image_width = i['artwork']['dictionary']['width']
+ image_height = i[0]['artwork']['dictionary']['height']
+ artwork = get_cover(image_url, image_width, image_height)
+ except:
+ artwork = ""
+
+ url = i['contentDescriptor']['url']
+ artists_result.append({'title':artist, 'url':url, 'image':artwork})
+ result['artists'] = artists_result
+
+ except:
+ pass
+
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** The widespread use of bare `except:` blocks hides real errors and makes debugging difficult.
Several blocks here wrap large sections of logic in `try: ... except: pass`. This will hide real programming errors (e.g., `KeyError`, `TypeError`, `NameError`), not just missing optional fields, and can silently degrade the scraper’s output. Please catch only the specific exceptions you expect (e.g., `KeyError` / `IndexError` for missing fields), or refactor into smaller helpers with targeted error handling to keep failures visible while still allowing for genuinely optional data.
Suggested implementation:
```python
# Build artists result list with targeted error handling so that missing
# optional fields don't hide real programming errors.
artists_result = []
# Safely get the items list; if artists is None or not a dict, fall back to empty.
try:
artist_items = artists.get("items", []) if artists is not None else []
except AttributeError:
artist_items = []
for item in artist_items:
# Title is expected to be present; if not, skip this item rather than
# hiding a KeyError in a broad try/except.
try:
artist_title = item["title"]
except (KeyError, TypeError):
continue
# Artwork is optional; if any of the nested keys are missing or the
# structure is unexpected, fall back to an empty artwork string.
artwork = ""
try:
artwork_dict = item["artwork"]["dictionary"]
image_url = artwork_dict["url"]
image_width = artwork_dict["width"]
image_height = artwork_dict["height"]
artwork = get_cover(image_url, image_width, image_height)
except (KeyError, TypeError):
artwork = ""
# URL is required for a valid artist entry; if missing, skip the item.
try:
url = item["contentDescriptor"]["url"]
except (KeyError, TypeError):
continue
artists_result.append(
{
"title": artist_title,
"url": url,
"image": artwork,
}
)
if artists_result:
result["artists"] = artists_result
```
- If there are other similar blocks for albums, songs, playlists, or videos that use broad `try: ... except: pass`, they should be updated in the same style:
- Remove outer broad `try/except` that swallows all exceptions.
- Use small, focused `try/except` blocks around genuinely optional fields (e.g., nested artwork dictionaries) and catch only `KeyError`, `TypeError`, or `AttributeError` as appropriate.
- For required fields (such as URLs or titles needed to build a valid result entry), either let the exception surface or `continue`/skip that specific item while still logging or handling it explicitly if the codebase has a logging convention.
</issue_to_address>
### Comment 5
<location> `Apple-Music-Scraper/utils.py:79-88` </location>
<code_context>
+
+ return f"https://music.apple.com/{country}/song/{title}/{song_id}"
+
+def get_all_singles(url="https://music.apple.com/us/artist/king-princess/1349968534"):
+ """
+ Fetch all singles & EP URLs from an Apple Music artist page.
+
+ Parameters
+ ----------
+ url : str, optional
+ Base artist page URL. Defaults to the sample King Princess artist link.
+
+ Returns
+ -------
+ list[str]
+ A list of Apple Music URLs for all singles & EPs for the artist.
+
+ Notes
+ -----
+ - Apple Music loads singles under the `/see-all?section=singles` endpoint.
+ - This function retrieves the serialized server data, parses the `items` section,
+ and extracts the correct song/EP URLs.
+ - Used internally by `artist_scrape()`.
+ """
+ result = []
+ url = url+"/see-all?section=singles"
+
+ headers = {
</code_context>
<issue_to_address>
**nitpick:** Simple string concatenation for `url` can produce malformed URLs if the base has a trailing slash.
If the caller passes an artist URL with a trailing slash (e.g. `.../1349968534/`), this becomes `.../1349968534//see-all?section=singles`. To avoid malformed URLs, either strip any trailing slash before appending the path segment or use `urllib.parse.urljoin`.
</issue_to_address>
### Comment 6
<location> `Apple-Music-Scraper/main.py:123` </location>
<code_context>
def search(keyword="sasha sloan"):
"""
Search Apple Music for artists, songs, albums, playlists and videos.
Parameters
----------
keyword : str, optional
Search query to send to Apple Music. Defaults to "sasha sloan".
Returns
-------
dict
Structured JSON-like dictionary containing search results:
- artists
- albums
- songs
- playlists
- videos
Notes
-----
Scrapes `serialized-server-data` to access Apple Music's internal search structure.
"""
result = {
'artists':[],
'albums':[],
'songs':[],
'playlists':[],
'videos':[]
}
link = "https://music.apple.com/us/search?term="+keyword
headers = {
"User-Agent": "Mozilla/5.0"
}
rspn = requests.get(link, headers=headers)
soup = BeautifulSoup(rspn.text, "html.parser")
items = soup.find('script', {'id': 'serialized-server-data'})
our_json = json.loads(items.text)
sections = our_json[0]['data']['sections']
for i in sections:
if "artist" in i['id']:
artists = i
elif "album" in i['id']:
albums = i
elif "song" in i['id']:
songs = i
elif "playlist" in i['id']:
playlists = i
elif "music_video" in i['id']:
videos = i
try:
artists_result = []
for i in artists['items']:
artist = i['title']
try:
image_url = i['artwork']['dictionary']['url']
image_width = i['artwork']['dictionary']['width']
image_height = i[0]['artwork']['dictionary']['height']
artwork = get_cover(image_url, image_width, image_height)
except:
artwork = ""
url = i['contentDescriptor']['url']
artists_result.append({'title':artist, 'url':url, 'image':artwork})
result['artists'] = artists_result
except:
pass
try:
albums_result = []
for i in albums['items']:
song = i['titleLinks'][0]['title']
artist = i['subtitleLinks'][0]['title']
try:
image_url = i['artwork']['dictionary']['url']
image_width = i['artwork']['dictionary']['width']
image_height = i[0]['artwork']['dictionary']['height']
artwork = get_cover(image_url, image_width, image_height)
except:
artwork = ""
url = i['contentDescriptor']['url']
albums_result.append({'title':song, 'artist':artist, 'url':url, 'image':artwork})
result['albums'] = albums_result
except:
pass
try:
songs_result = []
for i in songs['items']:
song = i['title']
artist = i['subtitleLinks'][0]['title']
try:
image_url = i['artwork']['dictionary']['url']
image_width = i['artwork']['dictionary']['width']
image_height = i[0]['artwork']['dictionary']['height']
artwork = get_cover(image_url, image_width, image_height)
except:
artwork = ""
url = i['contentDescriptor']['url']
songs_result.append({'title':song, 'artist':artist, 'url':url, 'image':artwork})
result['songs'] = songs_result
except:
pass
try:
playlists_result = []
for i in playlists['items']:
song = i['titleLinks'][0]['title']
artist = i['subtitleLinks'][0]['title']
try:
image_url = i['artwork']['dictionary']['url']
image_width = i['artwork']['dictionary']['width']
image_height = i[0]['artwork']['dictionary']['height']
artwork = get_cover(image_url, image_width, image_height)
except:
artwork = ""
url = i['contentDescriptor']['url']
playlists_result.append({'title':song, 'artist':artist, 'url':url, 'image':artwork})
result['playlists'] = playlists_result
except:
pass
try:
videos_results = []
for i in videos['items']:
song = i['titleLinks'][0]['title']
artist = i['subtitleLinks'][0]['title']
try:
image_url = i['artwork']['dictionary']['url']
image_width = i['artwork']['dictionary']['width']
image_height = i[0]['artwork']['dictionary']['height']
artwork = get_cover(image_url, image_width, image_height)
except:
artwork = ""
url = i['contentDescriptor']['url']
videos_results.append({'title':song, 'artist':artist, 'url':url, 'image':artwork})
result['videos'] = videos_results
except:
pass
return result
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Use f-string instead of string concatenation ([`use-fstring-for-concatenation`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-fstring-for-concatenation/))
- Extract duplicate code into function ([`extract-duplicate-method`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/extract-duplicate-method/))
- Use `except Exception:` rather than bare `except:` [×6] ([`do-not-use-bare-except`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/do-not-use-bare-except/))
</issue_to_address>
### Comment 7
<location> `Apple-Music-Scraper/main.py:256` </location>
<code_context>
def song_scrape(url="https://music.apple.com/us/song/california/1821538031"):
"""
Scrape a single Apple Music song page and extract metadata.
Parameters
----------
url : str, optional
URL of the Apple Music song. Defaults to sample link.
Returns
-------
dict
Dictionary containing:
- title
- image (full resolution)
- kind (song type)
- album info (title + URL)
- artist info (title + URL)
- preview-url
- list of more songs
Notes
-----
Uses the `schema:song` JSON-LD tag to extract preview URL.
"""
result = {
'title':'',
'image':'',
'kind':'',
'album': {
'title':'',
'url':''
},
'artist': {
'title':'',
'url':''
},
'more':[],
'preview-url':''
}
rspn = requests.get(url)
soup = BeautifulSoup(rspn.text, "html.parser")
items = soup.find('script', {'id': 'serialized-server-data'})
our_json = json.loads(items.text)
song_details = our_json[0]['data']['sections'][0]
result['title'] = song_details['items'][0]['title']
image_url = song_details['items'][0]['artwork']['dictionary']['url']
image_width = song_details['items'][0]['artwork']['dictionary']['width']
image_height = song_details['items'][0]['artwork']['dictionary']['height']
result['image'] = get_cover(image_url, image_width, image_height)
result['kind'] = song_details['presentation']['kind']
result['album']['title'] = song_details['items'][0]['album']
result['album']['url'] = song_details['items'][0]['albumLinks'][0]['segue']['actionMetrics']['data'][0]['fields']['actionUrl']
result['artist']['title'] = song_details['items'][0]['artists']
result['artist']['url'] = song_details['items'][0]['artistLinks'][0]['segue']['actionMetrics']['data'][0]['fields']['actionUrl']
json_tag = soup.find("script", {"id": "schema:song", "type": "application/ld+json"})
data = json.loads(json_tag.string)
preview_url = data['audio']['audio']['contentUrl']
result['preview-url'] = preview_url
more_songs = our_json[0]['data']['sections'][-1]['items']
more_songs_list = []
for i in more_songs:
more_songs_list.append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
result['more'] = more_songs_list
return result
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Convert for loop into list comprehension ([`list-comprehension`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/list-comprehension/))
- Move assignment closer to its usage within a block [×2] ([`move-assign-in-block`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/move-assign-in-block/))
- Merge dictionary assignment with declaration [×2] ([`merge-dict-assign`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/merge-dict-assign/))
</issue_to_address>
### Comment 8
<location> `Apple-Music-Scraper/main.py:334` </location>
<code_context>
def album_scrape(url="https://music.apple.com/us/album/1965/1817707266?i=1817707585"):
"""
Scrape an Apple Music album page and extract metadata, songs, related albums, videos, etc.
Parameters
----------
url : str, optional
URL of the Apple Music album. Defaults to example album.
Returns
-------
dict
Dictionary containing:
- title
- image
- caption/description
- artist info
- song URLs
- album info text
- more songs (same artist)
- similar (recommended) albums
- videos related to the album
Notes
-----
Extracts multiple sections such as:
- album-detail
- track-list
- similar albums
- more by artist
- album videos
"""
result = {
'title':'',
'image':'',
'caption':'',
'artist': {
'title':'',
'url':''
},
'songs':[],
'info':'',
'more':[],
'similar':[],
'videos':[]
}
headers = {
"User-Agent": "Mozilla/5.0"
}
rspn = requests.get(url, headers=headers)
soup = BeautifulSoup(rspn.text, "html.parser")
items = soup.find('script', {'id': 'serialized-server-data'})
our_json = json.loads(items.text)
sections = our_json[0]['data']['sections']
index=0
for i in sections:
if "album-detail" in i['id']:
album_detail_index = index
elif "track-list " in i['id']:
track_list_index = index
elif "video" in i['id']:
video_index = index
elif "more" in i['id']:
more_index = index
elif "you-might-also-like" in i['id']:
similar_index = index
elif "track-list-section" in i['id']:
track_list_section_index = index
index+=1
try:
result['title'] = sections[album_detail_index]['items'][0]['title']
except:
pass
try:
image_url = sections[album_detail_index]['items'][0]['artwork']['dictionary']['url']
image_width = sections[album_detail_index]['items'][0]['artwork']['dictionary']['width']
image_height = sections[album_detail_index]['items'][0]['artwork']['dictionary']['height']
result['image'] = get_cover(image_url, image_width, image_height)
except:
pass
try:
result['caption'] = sections[album_detail_index]['items'][0]['modalPresentationDescriptor']['paragraphText']
except:
pass
try:
result['artist']['title'] = sections[album_detail_index]['items'][0]['subtitleLinks'][0]['title']
result['artist']['url'] = sections[album_detail_index]['items'][0]['subtitleLinks'][0]['segue']['actionMetrics']['data'][0]['fields']['actionUrl']
except:
pass
try:
album_songs = sections[track_list_index]['items']
for i in album_songs:
result['songs'].append(convert_album_to_song_url(i['contentDescriptor']['url']))
except:
pass
try:
result['info'] = sections[track_list_section_index]['items'][0]['description']
more_songs = sections[more_index]['items']
for i in more_songs:
result['more'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
except:
pass
try:
similar_songs = sections[similar_index]['items']
for i in similar_songs:
result['similar'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
except:
pass
try:
videos = sections[video_index]['items']
for i in videos:
result['videos'].append(i['contentDescriptor']['url'])
except:
pass
return result
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Replace manual loop counter with call to enumerate ([`convert-to-enumerate`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/convert-to-enumerate/))
- Use `except Exception:` rather than bare `except:` [×8] ([`do-not-use-bare-except`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/do-not-use-bare-except/))
- Low code quality found in album\_scrape - 21% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))
<br/><details><summary>Explanation</summary>
The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.
How can you solve this?
It might be worth refactoring this function to make it shorter and more readable.
- Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.</details>
</issue_to_address>
### Comment 9
<location> `Apple-Music-Scraper/main.py:518` </location>
<code_context>
def video_scrape(url="https://music.apple.com/us/music-video/gucci-mane-visualizer/1810547026"):
"""
Scrape Apple Music music-video page and extract metadata + video file URL.
Parameters
----------
url : str, optional
URL of the Apple Music music-video. Defaults to example.
Returns
-------
dict
{
title,
image,
artist: {title, url},
video-url,
more (same artist),
similar (same genre)
}
Notes
-----
Uses JSON-LD block `schema:music-video` to extract the direct video content URL.
"""
result = {
'title': '',
'image': '',
'artist': {
'title': '',
'url': ''
},
'video-url': '',
'more': [],
'similar':[]
}
headers = {
"User-Agent": "Mozilla/5.0"
}
res = requests.get(url, headers=headers)
soup = BeautifulSoup(res.text, "html.parser")
items = soup.find('script', {'id': 'serialized-server-data'})
our_json = json.loads(items.text)
sections = our_json[0]['data']['sections']
for i in sections:
if "music-video-header" in i['id']:
music_video_header = i
elif "more-by-artist" in i['id']:
more = i
elif "more-in-genre" in i['id']:
similar = i
try:
result['title'] = music_video_header['items'][0]['title']
except:
pass
try:
image_url = music_video_header['items'][0]['artwork']['dictionary']['url']
image_width = music_video_header['items'][0]['artwork']['dictionary']['width']
image_height = music_video_header['items'][0]['artwork']['dictionary']['height']
result['image'] = get_cover(image_url, image_width, image_height)
except:
pass
try:
result['artist']['title'] = music_video_header['items'][0]['subtitleLinks'][0]['title']
result['artist']['url'] = music_video_header['items'][0]['subtitleLinks'][0]['segue']['actionMetrics']['data'][0]['fields']['actionUrl']
except:
pass
try:
json_tag = soup.find("script", {"id": "schema:music-video", "type": "application/ld+json"})
data = json.loads(json_tag.string)
result['video-url'] = data['video']['contentUrl']
except:
pass
try:
for i in more['items']:
result['more'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
except:
pass
try:
for i in similar['items']:
result['similar'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
except:
pass
return result
</code_context>
<issue_to_address>
**issue (code-quality):** Use `except Exception:` rather than bare `except:` [×6] ([`do-not-use-bare-except`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/do-not-use-bare-except/))
</issue_to_address>
### Comment 10
<location> `Apple-Music-Scraper/main.py:558` </location>
<code_context>
def artist_scrape(url="https://music.apple.com/us/artist/king-princess/1349968534"):
"""
Scrape an Apple Music artist page and extract all available metadata.
Parameters
----------
url : str, optional
Apple Music artist page URL. Defaults to King Princess sample link.
Returns
-------
dict
Dictionary containing:
- title
- image
- latest release URL
- list of top songs
- all albums
- singles & EPs
- playlists
- videos
- similar artists
- appears on
- more-to-see (videos)
- more-to-hear (songs)
- about text
- extra info (bio subtitle)
Notes
-----
This is the most complex scraper and extracts ~12 different sections
from the artist page.
"""
result = {
'title':'',
'image':'',
'latest':'',
'top':[],
'albums':[],
'singles_and_EP':[],
'playlists':[],
'videos':[],
'similar':[],
'appears_on':[],
'more_to_see':[],
'more_to_hear':[],
'about':'',
'info':'',
}
headers = {
"User-Agent": "Mozilla/5.0"
}
rspn = requests.get(url, headers=headers)
soup = BeautifulSoup(rspn.text, "html.parser")
items = soup.find('script', {'id': 'serialized-server-data'})
our_json = json.loads(items.text)
sections = our_json[0]['data']['sections']
for i in sections:
if "artist-detail-header-section" in i['id']:
artist_detail = i
elif "latest-release-and-top-songs" in i['id']:
latest_and_top = i
elif "full-albums" in i['id']:
albums = i
elif "playlists" in i['id']:
playlists = i
elif "music-videos" in i['id']:
videos = i
elif "singles" in i['id']:
singles = i
elif "appears-on" in i['id']:
appears_on = i
elif "more-to-see" in i['id']:
more_to_see = i
elif "more-to-hear" in i['id']:
more_to_hear = i
elif "artist-bio" in i['id']:
bio = i
elif "similar-artists" in i['id']:
similar = i
try:
result['title'] = artist_detail['items'][0]['title']
except:
pass
try:
image_url = artist_detail['items'][0]['artwork']['dictionary']['url']
image_width = artist_detail['items'][0]['artwork']['dictionary']['width']
image_height = artist_detail['items'][0]['artwork']['dictionary']['height']
result['image'] = get_cover(image_url, image_width, image_height)
except:
pass
try:
result['latest'] = latest_and_top['pinnedLeadingItem']['item']['segue']['actionMetrics']['data'][0]['fields']['actionUrl']
except:
pass
try:
for i in latest_and_top['items']:
result['top'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
except:
pass
try:
for i in albums['items']:
result['albums'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
except:
pass
try:
result['singles_and_EP'] = get_all_singles(url)
except:
pass
try:
for i in playlists['items']:
result['playlists'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
except:
pass
try:
for i in videos['items']:
result['videos'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
except:
pass
try:
for i in similar['items']:
result['similar'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
except:
pass
try:
for i in appears_on['items']:
result['appears_on'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
except:
pass
try:
for i in more_to_see['items']:
result['more_to_see'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
except:
pass
try:
for i in more_to_hear['items']:
result['more_to_hear'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
except:
pass
try:
result['about'] = bio['items'][0]['modalPresentationDescriptor']['paragraphText']
except:
pass
try:
result['info'] = bio['items'][0]['modalPresentationDescriptor']['headerSubtitle']
except:
pass
return result
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Use `except Exception:` rather than bare `except:` [×14] ([`do-not-use-bare-except`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/do-not-use-bare-except/))
- Low code quality found in artist\_scrape - 10% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))
<br/><details><summary>Explanation</summary>
The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.
How can you solve this?
It might be worth refactoring this function to make it shorter and more readable.
- Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.</details>
</issue_to_address>
### Comment 11
<location> `Apple-Music-Scraper/utils.py:80` </location>
<code_context>
def get_all_singles(url="https://music.apple.com/us/artist/king-princess/1349968534"):
"""
Fetch all singles & EP URLs from an Apple Music artist page.
Parameters
----------
url : str, optional
Base artist page URL. Defaults to the sample King Princess artist link.
Returns
-------
list[str]
A list of Apple Music URLs for all singles & EPs for the artist.
Notes
-----
- Apple Music loads singles under the `/see-all?section=singles` endpoint.
- This function retrieves the serialized server data, parses the `items` section,
and extracts the correct song/EP URLs.
- Used internally by `artist_scrape()`.
"""
result = []
url = url+"/see-all?section=singles"
headers = {
"User-Agent": "Mozilla/5.0"
}
res = requests.get(url, headers=headers)
soup = BeautifulSoup(res.text, "html.parser")
items = soup.find('script', {'id': 'serialized-server-data'})
our_json = json.loads(items.text)
sections = our_json[0]['data']['sections'][0]['items']
for i in sections:
result.append((i['segue']['actionMetrics']['data'][0]['fields']['actionUrl']))
return result
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Move assignment closer to its usage within a block ([`move-assign-in-block`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/move-assign-in-block/))
- Use f-string instead of string concatenation ([`use-fstring-for-concatenation`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-fstring-for-concatenation/))
- Convert for loop into list comprehension ([`list-comprehension`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/list-comprehension/))
- Inline variable that is immediately returned ([`inline-immediately-returned-variable`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/inline-immediately-returned-variable/))
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Apple-Music-Scraper/main.py
Outdated
| 'playlists':[], | ||
| 'videos':[] | ||
| } | ||
| link = "https://music.apple.com/us/search?term="+keyword |
There was a problem hiding this comment.
issue: The search keyword should be URL-encoded before being concatenated into the query string.
Directly concatenating the raw keyword will break searches for values with spaces, &, +, or non-ASCII characters. Use proper URL encoding, e.g. urllib.parse.quote_plus(keyword) and f"https://music.apple.com/us/search?term={quote_plus(keyword)}", so the search works for arbitrary input.
Apple-Music-Scraper/main.py
Outdated
| elif "track-list " in i['id']: | ||
| track_list_index = index | ||
| elif "video" in i['id']: | ||
| video_index = index | ||
| elif "more" in i['id']: | ||
| more_index = index | ||
| elif "you-might-also-like" in i['id']: | ||
| similar_index = index | ||
| elif "track-list-section" in i['id']: |
There was a problem hiding this comment.
issue (bug_risk): The "track-list " check includes a trailing space, which likely prevents matching the intended section.
Because of that trailing space, "track-list " in i['id'] will likely never match, so track_list_index may never be set and the later sections[track_list_index] access will always fall into the except path. Consider matching "track-list" instead, and preferably use a stricter check like equality or startswith rather than a substring search to make this more robust.
| def get_all_singles(url="https://music.apple.com/us/artist/king-princess/1349968534"): | ||
| """ | ||
| Fetch all singles & EP URLs from an Apple Music artist page. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| url : str, optional | ||
| Base artist page URL. Defaults to the sample King Princess artist link. | ||
|
|
||
| Returns |
There was a problem hiding this comment.
nitpick: Simple string concatenation for url can produce malformed URLs if the base has a trailing slash.
If the caller passes an artist URL with a trailing slash (e.g. .../1349968534/), this becomes .../1349968534//see-all?section=singles. To avoid malformed URLs, either strip any trailing slash before appending the path segment or use urllib.parse.urljoin.
Description
This PR adds a brand-new Apple Music Web Scraper capable of scraping:
It parses Apple Music’s internal
serialized-server-dataJSON structure and converts it into a clean Python output.This feature did NOT exist in the repository before and expands the Scrapping/Social Media category significantly.
What’s Included:
apple_music_scraper.py– Main scraper logicutils.py– Helper methods (cover resolver, URL converter, etc.)README.md– Full documentation + examplesrequirements.txt– clean dependency list (requests,beautifulsoup4)Fixes #none
No existing issue was referenced; this is a brand-new standalone feature.
Type of change
Checklist:
README.mdTemplate for README.mdrequirements.txtfile if needed.Project Metadata
Category:
Title: Apple Music Web Scraper
Folder:
Apple-Music-ScraperRequirements:
requirements.txtScript:
apple_music_scraper.pyArguments: none
Contributor:
abssdghiDescription:
A powerful and fully-featured Apple Music scraper that extracts songs, albums, playlists, videos, artist pages, and full search results using Apple Music’s internal structured JSON data.
Summary by Sourcery
Add a new Apple Music web scraper module that extracts structured metadata from various Apple Music web pages using their embedded serialized JSON data.
New Features: